problems with a sub-select (takes donkeys' years) - Mailing list pgsql-general

From Stuart Rison
Subject problems with a sub-select (takes donkeys' years)
Date
Msg-id v04003a05b325562e3c99@[128.40.242.190]
Whole thread Raw
Responses Re: [GENERAL] problems with a sub-select (takes donkeys' years)
List pgsql-general
Hello,

I've got two SELECT statements which work fine (take about 5secs each) when
I do them sequentially but take ages (around 5mins) if in a statement with
a sub-SELECT.

The individual SELECTs are:

SELECT p1.brecard_id FROM
    malignant_pathologies p1,
    malignant_pathologies p2,
    malignant_pathologies p3
WHERE p1.code='MAPH'
AND p1.brecard_id=p2.brecard_id
AND p2.code='AMCA'
AND p3.brecard_id=p1.brecard_id
AND p3.code='LOCA';

brecard_id
----------------
DSTL12031999016
DPHA12031999017
DCCH12031999056
DUCH12031999059
DUCH12031999063
DCCH12031999077
DUCH12031999098
(7 rows)

SELECT brecard_id,count(brecard_id) FROM
    malignant_pathologies
WHERE brecard_id IN
('DSTL12031999016','DPHA12031999017','DCCH12031999056','DUCH12031999059','DUCH12
031999063','DCCH12031999077','DUCH12031999098')
GROUP BY brecard_id;

brecard_id      |count
----------------+-----
DCCH12031999056 |   10
DCCH12031999077 |   12
DPHA12031999017 |   11
DSTL12031999016 |   11
DUCH12031999059 |   13
DUCH12031999063 |    6
DUCH12031999098 |   14
(7 rows)

(This is just an example query, what I'm trying to do here is use the list
of brecard_id's generated by the previous query).

Both of these execute in about 5 seconds (on 1000 rows)

When I put them together as:

SELECT brecard_id,count(brecard_id) FROM
    malignant_pathologies
WHERE brecard_id IN (
    SELECT p1.brecard_id FROM
        malignant_pathologies p1,
        malignant_pathologies p2,
        malignant_pathologies p3
    WHERE p1.code='MAPH'
    AND p1.brecard_id=p2.brecard_id
    AND p2.code='AMCA'
    AND p3.brecard_id=p1.brecard_id
    AND p3.code='LOCA');

It takes around 5mins for the query to complete!  The EXPLAIN for the quey is:

NOTICE:  QUERY PLAN:

Aggregate  (cost=4.27 size=0 width=0)
  ->  Group  (cost=4.27 size=0 width=0)
        ->  Sort  (cost=4.27 size=0 width=0)
              ->  Seq Scan on malignant_pathologies  (cost=4.27 size=99
width=12)
                    SubPlan
                      ->  Nested Loop  (cost=8.27 size=1 width=36)
                            ->  Nested Loop  (cost=6.27 size=1 width=24)
                                  ->  Seq Scan on malignant_pathologies p3
(cost=4.27 size=1 width=12)
                                  ->  Index Scan using
malignant_pathologies_pkey on malignant_pathologies p1  (cost=2.00 size=1
width=12)
                            ->  Index Scan using malignant_pathologies_pkey
on malignant_pathologies p2  (cost=2.00 size=1 width=12)

EXPLAIN

[Which is not different from putting together the EXPLAINs from each of the
individual queries]

Can anyone explain why the sub-query form takes so long?

Regards,

Stuart.




+-------------------------+--------------------------------------+
| Stuart Rison            | Ludwig Institute for Cancer Research |
+-------------------------+ 91 Riding House Street               |
| Tel. (0171) 878 4041    | London, W1P 8BT, UNITED KINGDOM.     |
| Fax. (0171) 878 4040    | stuart@ludwig.ucl.ac.uk              |
+-------------------------+--------------------------------------+



pgsql-general by date:

Previous
From: Herouth Maoz
Date:
Subject: Re: [GENERAL] Returning an integer from a date
Next
From: Herouth Maoz
Date:
Subject: Re: [GENERAL] problems with a sub-select (takes donkeys' years)